Elasticsearch Query DSL Syntax Notes
TLDR
- Query DSL Advantages: Compared to Query String, DSL supports Nested queries, geospatial queries, custom scoring, and complex boolean logic, with a clear structure that is easy to maintain.
- Match Query: The core of full-text search;
minimum_should_matchis only effective underORlogic, with a floor value of 1. - Multi Match: Provides various strategies such as
best_fields(default, takes the highest score),most_fields(sums scores), andcross_fields(cross-field search). - Combined Fields: Term-centric; treats multiple text fields as a single field, suitable for cross-field searching.
- Nested Query: Solves the loss of correlation caused by the flattening of object types; must be used for
nestedtype fields. - Date Range Query: It is recommended to use string formats consistently and utilize
Date Math(e.g.,||/d) for rounding to avoid parsing errors caused by mixing numeric and string types. - Performance Warning:
wildcardandregexpqueries have poor performance; avoid using leading wildcards or complex regular expressions.
Query DSL vs Query String
In production environments, Query DSL (Domain Specific Language) provides more powerful features and clearer error feedback than Query String due to its JSON-structured nature.
1. Functional Differences
Certain advanced features can only be implemented via Query DSL:
- Nested Queries: Preserves the correlation of fields within nested objects.
- Geospatial Queries: Such as
geo_distance. - Custom Scoring: Uses
function_scoreto customize relevance scoring. - Complex Boolean Logic: Combines
must,should,must_not, andfilterviabool.
Common Query DSL Syntax
1. Match Query - Full-Text Search
Used for full-text search; it performs tokenization and relevance scoring.
minimum_should_match Parameter
This parameter is only effective when operator = "OR", used to control the minimum number of conditions that must be met.
- Special Rules: The minimum match count has a floor of 1. When set to
-4or-100%, at least 1 term must match. - Percentage Calculation: Uses "round down" (floor). For example, if 4 terms are set to
75%, 4 × 0.75 = 3.0, at least 3 must match; if set to74%, 4 × 0.74 = 2.96, rounded down to 2. - Multi-condition Combination: Formats like
2<-25% 9<-3mean that if there are ≤ 2 tokens, 100% must match; for 3-9 tokens, a maximum of 25% can be missing; for > 9 tokens, a maximum of 3 can be missing.
lenient Parameter
Controls behavior when types do not match:
false(default): Throws an error, query fails.true: Ignores the query for that field, does not throw an error, but the field will have no matching results.
2. Multi Match Query - Multi-Field Search
Searches for the same keyword across multiple fields.
- best_fields: Takes the score of the highest-scoring field (default).
- most_fields: Sums the scores of all fields.
- cross_fields: Treats multiple fields as one large field, suitable for cross-field matching like names or addresses.
WARNING
When the search_analyzer settings for fields are inconsistent, the behavior of cross_fields changes, which may result in all terms needing to appear in the same field.
3. Combined Fields Query - Cross-Field Term Search
Adopts a term-centric approach, treating multiple text fields as a single combined field.
- Limitations: All fields must be of
texttype and use the samesearch_analyzer. - Execution Logic: Each term must appear in at least one field (can be distributed across different fields).
4. Range Query - Range Search
Used for numeric and date queries.
- Date Format Pitfalls: If the index mapping specifies a
format, the query parameters must align with it, or use theformatparameter to override. - Mixing Numeric and String Types: Numeric values are always interpreted as millisecond timestamps; it is recommended to use string formats (e.g.,
"2025-01-01") consistently to avoid parsing errors. - Time Precision Issues: If only the hour is provided (e.g.,
2023-01-15T08), Elasticsearch will format the document and query parameters to the same precision. It is recommended to explicitly specify the full time or use theDate Mathrounding function (e.g.,||/h).
5. Nested Query - Nested Object Search
Used to query nested type fields, solving the loss of correlation caused by the flattening of object types.
- When to encounter this issue: When the data structure is an array (e.g., a list of products in an order) and you need to ensure that the "product name" and "corresponding price" match within the same array element.
- Solution: Define the field as a
nestedtype and use thenestedquery.
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{ "term": { "comments.author": "John" }},
{ "term": { "comments.rating": 3 }}
]
}
}
}
}
}6. Fuzzy Query - Fuzzy Search
Fault-tolerant search that allows for spelling errors.
- Recommended Practice: For
textfields, prioritize using thematchquery with thefuzzinessparameter instead of using thefuzzyquery directly, becausematchis processed by the analyzer, which better meets search requirements.
7. Regexp Query - Regular Expression Search
The worst performance; should be avoided as much as possible.
- Anchor Limitations: Does not support
^and$anchor operators; the regular expression must match the entire string. - Special Characters: If you need to match special characters like
#, you must use double backslashes\\to escape them (e.g.,\\#).
Change Log
- 2025-11-04 Initial document created.
